npj Genomic Medicine — Latest Matching Preprints

1

Targeted BRCA1/BRCA2 Sequencing in a Bangladeshi Clinically Referred Cohort Identifies Candidate BRCA1 Loss-of-Function Variants and a Multi-Exon Deletion-Like CNV Signal

Al Sium, S. M.; Banu, T. A.; Goswami, B.; Naser, S. R.; Habib, M. A.; Akter, S.; Ara, M. H.; Al Din, S. M. S.; Nafisa, A.; Nayem, M. R.; Rabbi, M. F. A.; Sarkar, M. M. H.; Khan, M. S.

2026-05-20 oncology 10.64898/2026.05.11.26352643 medRxiv

Top 0.1%

17.3%

Show abstract

Background: Population-relevant BRCA1/BRCA2 data from Bangladesh are scarce, creating challenges for hereditary breast and ovarian cancer variant interpretation, counseling, and follow-up testing. We examined a clinically referred Bangladeshi cohort to characterize assay-derived BRCA1/BRCA2 short variants, sequencing-depth performance, and copy-number findings in a conservative pilot framework. Methods: Twenty-three de-identified blood-derived DNA samples were assessed using a targeted BRCA1/BRCA2 next-generation sequencing workflow. Downstream analysis used assay-generated short-variant, coverage, and CNV outputs, with coordinates reported on hg19/GRCh37. Short variants were evaluated from high-confidence PASS/VCC-H calls, and CNV review incorporated both target-region and amplicon-level copy-number patterns. Results: After removal of four low-VAF review observations, the primary germline-compatible dataset comprised 304 short-variant observations representing 34 unique variants. Both BRCA1 and BRCA2 contributed comparable variant burdens, while the overall profile was mainly composed of missense and synonymous changes. Six sample-specific heterozygous BRCA1 truncating candidates were observed, including five frameshift variants and one stop-gain variant. Protein-level mapping placed these events across the central-to-C-terminal portion of BRCA1. Sequencing depth was consistently high across the targeted regions, with all 4,255 amplicon-sample measurements exceeding 280x and 99.91% reaching at least 500x. Copy-number analysis highlighted one candidate BRCA1 multi-exon deletion-like event involving exons 15-20 in BCSIR-BRCA-21, with unresolved partial exon 14 involvement. Conclusions: This study provides an initial Bangladesh-focused targeted BRCA1/BRCA2 dataset and identifies candidate short-variant and CNV findings for validation. These findings should be interpreted as analytical candidates only and require confirmatory testing and expert clinical curation before any clinical application. The cohort is referral-enriched and should not be used to infer population prevalence.

2

The Genetic Landscape and Epidemiological Characteristics of Inherited Retinal Diseases in the Chinese Population

Zeng, B.; Cui, Z.; Zhou, S.; Dai, W.

2026-05-29 ophthalmology 10.64898/2026.05.27.26354224 medRxiv

Top 0.1%

17.2%

Show abstract

Background: Inherited Retinal Diseases (IRDs) are a group of genetically heterogeneous blinding conditions. Major global genomic reference databases are disproportionately enriched for individuals of European ancestry. This underrepresentation creates a significant bias that impedes the accuracy of genetic diagnosis in the Chinese population. This study aims to address this limitation by constructing a comprehensive genetic landscape of IRDs using large-scale deep-sequencing data from a large Chinese cohort. Methods: The study leveraged variant data primarily from 10,588 individuals in the China Metabolic Analytics Project (ChinaMAP) and cross-referenced findings against multiple national and international databases. We systematically curated variants within a targeted panel of 291 IRD-associated genes. Variant pathogenicity was assessed using a comprehensive pipeline integrating InterVar-automated classification based on 2015 American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines, ClinVar evidence (review status [≥] 1 star), and manual literature curation. We delineated the mutational spectrum, identified population-enriched pathogenic/likely pathogenic (P/LP) variants, and analyzed the distribution characteristics of IRD-associated highly-mutated genes. Furthermore, we calculated the carrier frequencies (CF) and genetic prevalence (GP) of autosomal recessive(AR)-IRD genes in the Chinese population. Results: The study revealed a highly concentrated genetic landscape for AR-IRDs in the Chinese population, with ABCA4 and USH2A emerging as the primary drivers of the genetic burden. This finding aligns with previous Chinese cohorts but contrasts with global databases, where genes such as the X-linked RPGR are more prevalent. In contrast, autosomal dominant (AD)-IRDs exhibited high locus heterogeneity, with pathogenic variants dispersed across numerous genes (e.g., COL2A1 and MFN2). We identified a series of P/LP variants that were either high-frequency or significantly enriched in the Chinese population, such as CNGB1 (p.P530R) and specific recurrent alleles in ABCA4 and CYP4V2. The estimated cumulative CF for AR-IRDs was 1 in 5.60, and the theoretical total GP was 1 in 2,624.67, based on the ChinaMAP data. Conclusion: By integrating the ChinaMAP dataset with diverse genomic resources, this study provides a genetic landscape of IRDs in the Chinese population. Our analysis shows a concentrated mutational spectrum in AR-IRDs, contrasting with the pronounced heterogeneity in AD-IRDs. These findings, including population-specific pathogenic variants and refined prevalence estimates, provide a resource for precision diagnostics, genetic counseling, expanded carrier screening (ECS), and public health policy development in China.

3

In vitro splice-switching oligonucleotide rescues aberrant GFM2 pseudoexon inclusion and restores mitochondrial activity

Gross, S.; Birnbaum, R.; Shaul Lotan, N.; Mor-Shaked, H.; Manor, J.; Shaag, A.; Rosenbluh, C.; Levy-Memo, A.; Yanovsky-Dagan, S.; Saada, A.; Harel, T.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354078 medRxiv

Top 0.1%

10.0%

Show abstract

Background: Biallelic variants in GFM2, encoding mitochondrial elongation factor G2 (mtEFG2), a GTPase involved in the termination stage of mitochondrial translation, cause autosomal recessive combined oxidative phosphorylation deficiency. Noncoding structural variants may be missed by exome sequencing but can disrupt splicing and provide opportunities for variant-specific therapeutic rescue. We investigated the molecular mechanism underlying suspected Leigh syndrome in an infant with mitochondrial disease and evaluated whether splice-switching oligonucleotide (SSO) treatment could correct the pathogenic splicing defect. Methods: The proband underwent exome sequencing followed by short-read and long-read whole genome sequencing. RNA sequencing, reverse-transcription PCR, quantitative PCR, and cycloheximide treatment were used to characterize the effect of the identified intronic duplication on GFM2 splicing and transcript stability. Patient-derived fibroblasts were treated with SSOs targeting the aberrant splice junction. Rescue was assessed by RNA studies, western blotting, and spectrophotometric measurement of cytochrome c oxidase (COX). Results: Whole genome sequencing identified a paternally-inherited GFM2 missense variant, NM_032380.5:c.2195C>T p.(Pro732Leu), in trans to a maternally-inherited 221-nucleotide intronic duplication, NM_032380.5:c.2029-741_2029-521dup. RNA studies revealed a 87-nucleotide pseudoexon, generated by activation of a cryptic acceptor splice site within the duplicated sequence. The resulting transcript harbored a premature termination codon (PTC) and underwent nonsense-mediated decay, as confirmed by cycloheximide rescue. Together with reduced mtEFG2 protein levels on western blot, the findings supported a loss-of-function mechanism. Enzymatic analysis of affected fibroblasts showed reduced activity of the mtDNA-dependent complex IV subunit COX, with preservation of the nuclear-encoded complex II enzyme succinate dehydrogenase and the control enzyme citrate synthase, consistent with impaired mitochondrial translation. A SSO targeting the aberrant intron-pseudoexon junction nearly abolished pseudoexon inclusion, restored correctly spliced GFM2 transcript from the duplication-containing allele, increased mtEFG2 protein levels, and significantly improved COX activity. Conclusions: This study identifies a pathogenic intronic GFM2 duplication that causes mitochondrial disease through pseudoexon activation and nonsense-mediated decay. The findings demonstrate the value of integrated genome and transcriptome analysis for exome-negative mitochondrial disease and provide in-vitro proof of concept that SSOs can restore transcript processing, protein expression, and mitochondrial respiratory-chain function in patient-derived cells.

4

Stratified evaluation of blood RNA sequencing in a rare disease cohort

Duzenli, T.; Durmus, S.; Kaya, H. E.; Sevilgen, F. E.; Kayhan, G.; Cakir, T.; Ergun, M. A.

2026-05-28 genetic and genomic medicine 10.64898/2026.05.27.26353804 medRxiv

Top 0.1%

9.8%

Show abstract

Background: RNA sequencing (RNA-seq) is increasingly recognized as a complementary tool to DNA-based sequencing for improving the diagnostic yield in Mendelian disorders. However, how the diagnostic performance of RNA-seq varies across molecularly and phenotypically distinct patient subgroups remains poorly defined. This study aimed to evaluate and compare the diagnostic utility of RNA-seq across three stratified groups of patients with non-diagnostic exome sequencing. Methods: We performed RNA-seq on whole blood samples from 90 patients with suspected Mendelian disease in whom clinical exome or whole-exome sequencing had failed to establish a molecular diagnosis. Patients were prospectively stratified into three groups of 30: (i) patients with a candidate variant of uncertain significance (VUS) with predicted splicing impact (Group 1), (ii) patients with a specific clinical pre-diagnosis but no identified pathogenic variant (Group 2), and (iii) patients without a specific pre-diagnosis or candidate variant (Group 3). Aberrant splicing, gene expression outliers, and allele-specific expression were analyzed using multiple bioinformatic tools and compared against a GTEx-derived control cohort. Results: RNA-seq contributed to a molecular diagnosis in 29 of 88 evaluable patients (32.9%). Diagnostic yield differed substantially across groups: 82.8% (24/29) in Group 1, 6.9% (2/29) in Group 2, and 10% (3/30) in Group 3. In Group 1, RNA-seq enabled reclassification of candidate VUS through direct demonstration of aberrant splicing events. In Group 2, RNA-seq identified a somatic mosaic ACTB variant missed by exome sequencing and reclassified a previously deprioritized APPL1 VUS. In Group 3, a deep intronic pseudoexon-activating variant in IGBP1 was identified in two siblings with severe microcephaly, providing evidence for a candidate X-linked microcephaly gene, and a pathogenic RNU4-2 variant was detected in a patient with ReNU syndrome, a non-protein-coding gene not captured by standard exome sequencing. Conclusions: RNA-seq has the highest diagnostic utility when applied to evaluate candidate splice variants identified by prior DNA testing but also provides independent diagnostic value in patients without candidate variants. The systematic comparison across stratified patient groups supports the integration of RNA-seq into clinical genomic workflows and highlights the need for standardized analytic frameworks.

5

Ancestry-stratified variant classification in monogenic diabetes genes: annotation coverage and differential curation burden

Dario, P.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350230 medRxiv

Top 0.1%

7.2%

Show abstract

Variant databases ClinVar and gnomAD are the backbone of clinical variant interpretation, but their population composition is skewed toward European ancestry. Whether this skew creates systematic classification disadvantages for non-European patients with monogenic diabetes has not been examined at the database level. ClinVar variant_summary (GRCh38, April 2026; 4,421,188 variants) was cross-referenced with gnomAD v4.0 genome data for 17 monogenic diabetes genes. Annotation coverage and variant classification rates were computed stratified by genetic ancestry group (AFR, AMR, EAS, SAS, MID, NFE, FIN, ASJ). Of 14,691 gnomAD variants across the 17 genes, only 29.7% had any ClinVar classification (range: 12.7%-61.3% by gene). Among classified variants, non-Finnish European (NFE) variants had the highest variant of uncertain significance (VUS) rate (32.1%) and the lowest benign/likely benign fraction (41.6%), consistent with a large submission volume without functional follow-up. African-ancestry (AFR) variants showed the second-highest VUS rate (29.2%), not statistically distinguishable from NFE after Bonferroni correction, while all other non-European groups had significantly lower rates (all p < 0.001). GCK showed a pattern inversion - non-European VUS rate (18.5%) exceeding European (15.0%) - consistent with progressive reclassification in European populations absent in non-European cohorts. Annotation coverage and VUS divergence were uncorrelated (r = -0.15, p = 0.57). The primary equity problem is a 70% annotation gap combined with a non-European curation deficit, not a simple VUS excess. Ancestry-stratified evaluation of ClinGen Variant Curation Expert Panel (VCEP) criteria performance is warranted across disease domains.

6

Differential causative effects of germline pathogenic variants in MUTYH and PALB2 in a patient with colorectal polyposis and breast cancer

Camacho Valenzuela, J.; Pelletier, D.; Polak, P.; Fu, L.; Hamel, N.; Domecq, C.; Ahmed, A.; Robles-Espinoza, C. D.; Foulkes, W. D.

2026-05-25 genetic and genomic medicine 10.64898/2026.05.15.26352890 medRxiv

Top 0.1%

6.3%

Show abstract

Purpose Patients carrying Germline Pathogenic Variants (GPVs) in multiple cancer susceptibility genes (CSGs) can be described within the context of Multi-locus Inherited Neoplasia Allele Syndrome (MINAS). The role of each GPV is typically interpreted based on clinical phenotypes. Here, we used tumor sequencing, particularly mutational signatures, to investigate the contribution of GPVs in MUTYH and PALB2 to colorectal polyposis and breast cancer in a single patient at a molecular level. Methods We analyzed tumor sequencing data, including mutational signatures and genomic scars, of a breast tumor and a colorectal polyp from a patient with biallelic GPVs in MUTYH and a heterozygous GPV in PALB2. Results The colorectal polyp showed a dominant contribution of MUTYH-associated Base Excision Repair deficiency (BERd) mutational signatures, with no evidence of Homologous Recombination Repair Deficiency (HRD). In contrast, the breast tumor showed both MUTYH-driven BERd and HRD-associated signatures, including SBS3, ID6 and an elevated HRD score, despite the absence of a detectable second hit in PALB2. These findings suggest a differential contribution from the CSGs, with MUTYH contributing to both lesions and PALB2 contributing specifically to the breast tumor. The observed pattern does not align with the additive or synergistic models described in MINAS. Conclusions Our study provides evidence that mutational signatures can elucidate the contribution of multiple CSGs to tumorigenesis within a single patient. These findings extend current interpretations of MINAS beyond additive or synergistic phenotypes, which may help to better understand tumor etiology, with potential clinical implications, including eligibility for targeted therapies.

7

Pharmacogenetic Characterization of Cytochrome P450 Genes involved in Psychotropic Medication Metabolism in a Cohort of Patients with Prader-Willi Syndrome

Moreno-Armengol, A.; Pareja, R.; Hernandez-Lazaro, A.; Capel, L.; Corripio, R.; Caixas, A.; Baena, N.

2026-05-18 pharmacology and therapeutics 10.64898/2026.05.09.26352521 medRxiv

Top 0.1%

6.3%

Show abstract

Prader-Willi syndrome (PWS) is a rare multisystemic disorder characterized by obesity, endocrine dysfunctions, and psychiatric comorbidities, which imply frequent use of psychotropic medications. They account for atypical responses to standard dosages of psychiatric drugs. Pharmacogenetics could be part of the reason for this situation, potentially offering a valuable tool for individualized treatment. This study analyzed allelic and phenotypic frequency distributions of five of the main cytochrome P450 enzymes (CYP2D6, CYP2B6, CYP2C19, CYP2C9, CYP3A4) involved in psychiatric drug metabolism in 47 patients with genetically confirmed diagnosis of PWS and compared them to reference frequencies in the general European population. Allelic frequency comparisons between the European reference population and the overall PWS cohort revealed a significant global difference for CYP2B6, with CYP2C19 and CYP2D6 showing trends toward significance. Although no global allelic differences remained significant after false discovery rate correction, post-hoc analyses consistently identified an enrichment of reduced- or non-functional alleles CYP2B619 and CYP2D610 in patients with PWS. Predicted metabolizer phenotype analyses showed a significant shift toward intermediate metabolizers of CYP3A4 in the PWS cohort, with corresponding depletion of normal metabolizers. Subgroup analyses indicated that allelic differences were more pronounced in maternal uniparental disomy and non-deletion subtypes, particularly for CYP2B6, although no significant differences were observed between PWS genetic subtypes. Overall, results imply potential differences in metabolizing activity in PWS patients, and subsequent implications in drug efficacy and tolerability. These results support the idea that pharmacogenetic testing may improve therapeutic decision-making in PWS for psychiatric treatment. Larger studies are needed to confirm these preliminary results.

8

Documented clinical genetic testing among carriers of hereditary breast and ovarian cancer variants: Ancestry and socioeconomic disparities in the All of Us research program

Yerukala Sathipati, S.; Scott, H.

2026-06-10 oncology 10.64898/2026.06.09.26355262 medRxiv

Top 0.1%

5.0%

Show abstract

Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.

9

FRMPD4, a causal gene for intellectual disability and epilepsy, is associated with X-linked non-syndromic hearing loss

Liedtke, D.; Rak, K.; Schrode, K. M.; Hehlert, P.; Chamanrou, N.; Bengl, D.; Katana, R.; Heydaran, S.; Doll, J.; Han, M.; Nanda, I.; Senthilan, P. R.; Juergens, L.; Bieniussa, L.; Voelker, J.; Neuner, C.; Hofrichter, M. A.; Schroeder, J.; Schellens, R. T.; de Vrieze, E.; van Wijk, E.; Zechner, U.; Herms, S.; Hoffmann, P.; Mueller, T.; Dittrich, M.; Bartsch, O.; Krawitz, P. M.; Klopocki, E.; Shehata-Dieler, W.; Maroofian, R.; Wang, T.; Worley, P. F.; Goepfert, M. C.; Galehdari, H.; Lauer, A. M.; Haaf, T.; Vona, B.

2026-03-30 genetic and genomic medicine 10.64898/2026.03.27.26349271 medRxiv

Top 0.1%

5.0%

Show abstract

Abstract Background Understanding the phenotypic spectrum of disease-associated genes is essential for accurate diagnosis and targeted therapy. FRMPD4 (FERM and PDZ Domain Containing 4) has previously been associated with intellectual disability and epilepsy. However, its potential role in non-syndromic hearing loss has not been explored. Methods We performed genetic analysis in two unrelated families presenting with non-syndromic sensorineural hearing loss, identifying maternally inherited missense variants in FRMPD4. Clinical phenotyping included audiological assessment and evaluation for neurodevelopmental involvement. Cross-species expression analyses were conducted in Drosophila, zebrafish, and mouse. Functional characterization included quantitative evaluation of sound-evoked responses in Drosophila nicht gut hoerend (ngh) mutants, assessment of neuronal development and acoustic startle responses in zebrafish loss of function models, and morphological cochlear analyses with auditory brainstem response measurements in knockout mice. Results Three affected males from two unrelated families presented with prelingual, bilaterally symmetrical sensorineural hearing loss, with confirmed congenital onset in one individual and no evidence of neurodevelopmental abnormalities. Cross-species analyses demonstrated evolutionarily conserved expression of FRMPD4 in auditory structures. In Drosophila, quantitative analysis of sound-evoked responses in ngh mutants revealed impaired auditory function. Zebrafish loss of function models exhibited reduced neuronal populations in the otic vesicle and posterior lateral line, abnormal neuromast development, and diminished acoustic startle responses. In mice, Frmpd4 knockout resulted in high-frequency hearing loss and cochlear abnormalities consistent with the human phenotype. Conclusions Our findings expand the phenotypic spectrum of FRMPD4 to include non-syndromic sensorineural hearing loss and establish its evolutionarily conserved role in auditory function. These results have direct implications for genetic diagnosis and variant interpretation in patients with hearing loss.

10

Genotype-Based Severity Scoring System in Wolfram Syndrome

Oiknine, L.; Tang, A. F.; Urano, F.

2026-03-26 genetic and genomic medicine 10.64898/2026.03.24.26349216 medRxiv

Top 0.1%

4.2%

Show abstract

Wolfram syndrome is a rare genetic disorder characterized by antibody-negative early-onset atypical diabetes mellitus, optic nerve atrophy, sensorineural hearing loss, diabetes insipidus (arginine vasopressin deficiency), and progressive neurodegeneration, with significant variability in disease severity. We assessed the accuracy of a genotype-based severity scoring system to predict the onset of cardinal symptoms in Wolfram syndrome. This system is based on the type of WFS1 variants (in-frame or out-of-frame) and their location relative to transmembrane domains. Severity scores were assigned to 324 patients with documented onset ages for diabetes mellitus, optic atrophy, hearing loss, and diabetes insipidus. Our analysis revealed a clear correlation between severity scores and earlier onset of diabetes mellitus and optic atrophy. Patients with in-frame variants outside transmembrane domains exhibited milder symptoms, especially WFS1 c.1672C>T (p.Arg558Cys) variant, whereas those with out-of-frame variants showed the earliest onset. Severity scores 3 and 4 did not follow the expected progression, suggesting that transmembrane domain involvement in both alleles may result in greater severity. These findings suggest that this scoring system provides valuable insights into the progression of Wolfram syndrome and may guide clinical care. Further refinement may improve its utility for predicting the onset of non-diabetic symptoms.

11

Rare neurological and neurodevelopmental variants in ALS link to onset, survival and family history

O'Donoghue, C.; Kacar, E.; Gomes, T.; Costello, E.; Pender, N.; Peelo, C.; Ryan, M.; Heverin, M.; Byrne, S.; Bede, P.; Hardiman, O.; McLaughlin, R. L.; Byrne, R. P.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.09.26354977 medRxiv

Top 0.1%

4.0%

Show abstract

Background: Neurological, neuropsychiatric, and neurodevelopmental disorders cluster in ALS families, sharing a common genetic architecture with ALS. Pathogenic variants in genes associated with other neurological, neurodevelopmental, or neuropsychiatric disorders may also co-occur in ALS and modify phenotype. We have sought to determine the prevalence and clinical pattern of likely-pathogenic/pathogenic (LP/P) non-ALS neurological, neurodevelopmental, and neuropsychiatric variants, alone and in combination with ALS-gene variants, in two large ALS cohorts. Methods: Whole-genome sequencing (WGS) of 469 Irish and 774 Answer ALS people with ALS (pwALS) was analysed for ClinVar LP/P variants associated with other neurological (n = 15541), neurodevelopmental (n = 9761), and neuropsychiatric (n = 321) phenotypes. Inheritance patterns for associated genes (autosomal recessive/autosomal dominant) along with the associated phenotype were validated using OMIM. Standardised clinical data included family history, site and age of onset, El Escorial category, survival, motor decline, and cognitive and behavioural assessments. Known ALS-gene variants and C9orf72 repeat expansion status were included for each cohort. Results: Non-ALS neurological variants were identified in 47/469 (10.0%) Irish and 69/774 (8.9%) Answer ALS participants, most frequently in hereditary spastic paraplegia-associated genes (3.2% Irish; 2.8% Answer ALS). Irish neurological variant carriers showed higher frequency of respiratory onset (10.6% vs 1.2%, Fisher's exact p = 0.002, {Phi} = 0.20) and fewer premorbid behavioural symptoms (0.92 +/- 0.56 vs 3.08 +/- 0.97, Cohen's d = -0.40). Neurodevelopmental variants occurred in 12/469 (2.6%) Irish and 20/774 (2.6%) Answer ALS participants. In the Irish cohort, neurodevelopmental variant carriers had significantly shorter survival in Cox proportional hazards model (log-rank p = 0.005), corresponding to a more than two-fold increased hazard of death (HR = 2.25, 95% CI 1.26-4.00), and had significantly increased familial burden of neuropsychiatric disorders among first- and second-degree relatives (negative binomial IRR for carriers = 2.41, 95% CI: 1.12-5.18, p = 0.025). Across combined cohorts, 18 individuals (Irish n = 8; Answer ALS n = 10) carried [≥]2 LP/P variants spanning ALS and non-ALS genes. Conclusion: Rare LP/P variants in genes associated with other neurological and neurodevelopmental disorders occur in up to 12% of pwALS across two independent cohorts. Carriers show distinct phenotypes, shorter survival, and characteristic family history patterns. These findings suggest that extended pleiotropic and oligogenic architectures may contribute to ALS heterogeneity.

12

Phenotype-Specific Recalibration of MAVE Data Enables Repurposing of BAP1 Functional Assays for Kury-Isidor Syndrome

Gupta, P.; Balton, E. V.; Tejura, M.; Kumar, R. D.; Snyder, M. W.; Stone, J.; Villani, R. M.; Peter, B. H.; Sirisak, C.; Ian, G. A.; Martha, H.-P.; Danny, M. E.; Jane, R.; Elisabeth, R. A.; Andrew, S. H.; Mark, W.; Undiagnosed Diseases Network (UDN), ; Kathleen, L. A.; Matthew, B. D.; Melissa, M. J.; Gail, J. P.; Katrina, D. M.; Elizabeth, B. E.; Fowler, D. M.; Starita, L. M.; McEwen, A. E.; Stergachis, A. B.

2026-05-21 genetic and genomic medicine 10.64898/2026.05.15.26352805 medRxiv

Top 0.1%

3.9%

Show abstract

Purpose Multiplexed assays of variant effect (MAVEs) are transforming clinical variant interpretation. However, many genes are associated with more than one disease, making it unclear whether functional data generated in one disease context may be directly applicable to another. For example, germline BAP1 missense variants are associated with both BAP1 tumor predisposition syndrome (BAP1-TPDS) and Kury-Isidor syndrome (KURIS), a rare neurodevelopmental disorder. Here, we demonstrate how phenotype-specific calibration of BAP1 MAVE data enables disease-specific variant classification. Methods Saturation genome editing (SGE) data for BAP1 were recalibrated using either BAP1-TPDS- or KURIS-associated missense variants as pathogenic controls. Functional evidence strength was quantified using the Odds of Pathogenicity (OddsPath) framework and mapped to ACMG/AMP PS3/BS3 criteria. Recalibrated functional evidence was integrated with standard clinical criteria for variant classification. A workshop was developed to teach phenotype-specific MAVE recalibration to clinicians and variant curators and evaluated for educational impact. Results Phenotype-specific recalibration using BAP1-TPDS and KURIS controls yielded OddsPath values consistent with PS3_Strong evidence in both contexts. Application of KURIS-specific recalibration enabled the diagnosis of KURIS in an individual with a previously uncertain BAP1 missense variant. The educational workshop enabled quantitatively improved understanding in applying functional evidence. Conclusion Phenotype-specific recalibration enables appropriately calibrated reuse of MAVE datasets across distinct disease contexts, increasing the clinical utility of MAVE datasets and the interpretability of variants in pleiotropic genes. This framework expands the diagnostic utility of existing functional datasets without requiring new experimental assays.

13

Detecting genomic regions enriched for reciprocal recombination in autism spectrum disorder

Mahoney, C. F.; Salter-Townshend, M.; Fitzpatrick, D. J.; Shields, D. C.

2026-05-27 genetics 10.64898/2026.05.26.727863 medRxiv

Top 0.1%

3.9%

Show abstract

Meiotic recombination is an important means of increasing genetic diversity by generating novel haplotypes in a population. Recombination separates linked loci extremely slowly in some regions, therefore genetic variants in high linkage disequilibrium may become co-adapted. Reciprocal recombination that separates co-adapted variants may generate a deleterious de novo haplotype that contributes to disease. We developed statistical methods to detect genomic regions of recombination excess in two different family-based study designs. We identified recombination in the Simons Simplex Collection in 273 simplex families with one child with autism spectrum disorder (ASD) and at least two unaffected children, in which recombinations can be mapped to the proband and contrasted with the recombination counts in unaffected siblings; and in 1,802 families with two children, where the number of recombinations identified can be contrasted with the expectation from a reference recombination map. Both strategies revealed a tail of low p-values for loci of interest that contrasted with the rest of the distribution. Permutation and bootstrap tests did not identify genome-wide primary findings in either cohort, but the most significant three-child cohort locus of recombination excess (between cadherin genes CDH4 and CDH26) replicated in the two-child cohort (p=0.01). While this replication strategy was not defined a priori, five of the most recombination enriched bins identified candidate ASD genes (p=0.02; WWOX, ADAMTS16, INSR, ADARB2, and HS6ST1). Since the six identified loci were not identified as regions of high de novo copy number variation in the study cohort and no CNVs were detected in any of the recombinant probands in the identified regions, they represent candidates for reciprocal recombinations generating unfavourable haplotypes for these genes. This study highlights a previously unidentified source of clinical genetic variability contributing to the molecular aetiology of ASD. AUTHOR SUMMARYAutism spectrum disorder (ASD) is a constellation of neurodevelopmental disabilities characterised by deficits in social communication and repetitive patterns of behaviour. While ASD is highly heritable, its genetic basis is complex and poorly understood. While some highly penetrant types of genetic variation have been identified, most people with ASD carry a large number of variants that each contribute a small amount to their overall phenotype. In addition to mutations in individual genes, changes in the configuration of genes along a chromosome may contribute to ASD. Here, we describe a method for identifying regions where such new configurations have occurred through recombination and attempt to find regions where such changes are more common in autistic children than in their non-autistic siblings. We explore recombination as a source of genetic variation contributing to autism, which has potential to inform clinicians in providing services to autistic people and their families.

14

Large-scale association study identifies lung cancer susceptibility copy number variants and their potential functional role in genetic instability

Xiao, F.; Qin, F.; Luo, X.; Slewitzke, S. E.; Fernandes, G. F.; Johansson, M.; Xiao, X.; Zaridze, D.; Bojesen, S. E.; Shete, S.; Albanes, D.; Aldrich, M. C.; Tardon, A.; Fernandez-Tardon, G.; Le Marchand, L.; Rennert, G.; Bickeböeller, H.; Wichmann, H.-E.; Risch, A.; Muley, T.; Rosenberger, A.; Field, J. K.; Davies, M.; Woll, P.; Kiemeney, L. A.; Haugen, A.; Zienolddiny, S.; Lam, S.; Johansson, M.; Grankvist, K.; Schabath, M. B.; Andrew, A.; Lazarus, P.; Arnold, S. M.; Zhu, D.; Brenner, H.; Neuhouser, M. L.; Hung, R. J.; Christiani, D. C.; McKay, J.; Cai, G.; Xia, J.; Amos, C. I.

2026-05-15 genetic and genomic medicine 10.64898/2026.05.11.26352741 medRxiv

Top 0.1%

3.7%

Show abstract

Background: Genome-wide association studies (GWAS) have identified numerous lung cancer susceptibility loci based on single nucleotide polymorphisms (SNPs), yet a substantial proportion of heritability remains unexplained. We therefore evaluated germline copy number variants (CNVs) as an underexplored source of genetic susceptibility and potential contributors to genomic instability in lung cancer. Methods: We conducted a genome-wide analysis of germline CNVs using 19,342 cases and 15,917 controls from the Transdisciplinary Research in Cancer of the Lung (TRICL) consortium, with replication in two independent cohorts. High-confidence CNVs were identified by integrating two CNV callers including PennCNV and modSaRa2. Association analyses were performed using both gene-based and CNV region-based approaches. Polygenic risk scores (PRS) were constructed from top loci, and functional validation was conducted using siRNA-mediated knockdown in lung fibroblast cells. Results: We identified CNVs in four genomic regions (1p36.22, 2q31.2, 6p21.32, and 19q13.32) significantly associated with lung cancer risk. Two loci (1p36.22 and 2q31.2) were consistently supported across both analytical strategies. A CNV-based PRS constructed from key genes (CLCN6, NFE2L2, OPA3, and PSMB8) was significantly associated with lung cancer risk and replicated across independent datasets. Functional assays demonstrated that knockdown of NFE2L2 and OPA3 increased endogenous DNA damage, supporting a role in genomic stability. Conclusions: Germline CNVs contribute to lung cancer susceptibility and may influence carcinogenesis through mechanisms related to genomic instability. Impact: These findings expand the genetic architecture of lung cancer and highlight CNVs as potential biomarkers for improving risk stratification and informing precision prevention strategies.

15

PAVS: A Standardized Database of Phenotype-Associated Variants from Saudi Arabian Rare Disease Patients

Abdelhakim, M.; Althagafi, A.; SCHOFIELD, P.; Hoehndorf, R.

2026-04-06 genetic and genomic medicine 10.64898/2026.04.05.26350189 medRxiv

Top 0.1%

3.7%

Show abstract

Genotype-phenotype databases are essential for variant interpretation and disease gene discovery. Genetic variation differs among human populations, mainly in allele frequencies and haplotype patterns shaped by ancestry and demographic history. Population-specific genotypes can influence traits and disease risk; this makes population specific characterization important. Most existing resources focus on the characterization of a population's genetic background, but do not represent the resulting phenotypes. We have developed PAVS (Phenotype-Associated Variants in Saudi Arabia), a curated, publicly accessible database that integrates 5,132 Saudi clinical cases from four Saudi cohorts and 522 cases from analysis of a mixed-population cohort, together with 1,856 cases from the Deciphering Developmental Disorders study (DDD) and 9,588 literature phenopackets. Each case record describes patient-level phenotypes, encoded with the Human Phenotype Ontology (HPO), and links them to genomic variants, gene identifiers, zygosity, pathogenicity classifications, and disease diagnoses mapped to standardized disease terminologies. The data is represented in Phenopackets format and as a knowledge graph in RDF. Additionally, a web interface provides phenotype-based similarity search, gene and variant browsers, and an HPO hierarchy explorer. We evaluate the utility of the phenotype annotations for gene prioritization using semantic similarity. While there are clear differences to global literature-curated databases, phenotypes in PAVS can successfully rank the correct gene at high rank (ROCAUC: 0.89). PAVS addresses a gap in population-specific genotype-phenotype resources and provides a benchmark for phenotype-driven variant prioritization in under-represented populations.

16

Health Impact Assessment of BRCA1/2 Cascade Screening for the Personalized Prevention of Hereditary Breast and Ovarian Cancers in Italy

Valz Gris, A.; Giacobini, E.; Tricomi, V.; Rumi, F.; Valentini, I.; Cristiano, A.; Testa, S.; Rosano, A.; Pezzullo, A. M.; Boccia, S.

2026-04-15 public and global health 10.64898/2026.04.13.26350758 medRxiv

Top 0.1%

3.6%

Show abstract

IntroductionPathogenic germline variants in the BRCA1 and BRCA2 genes confer a markedly increased risk of breast and ovarian cancer, for which effective preventive strategies are available. Although national and international guidelines recommend BRCA testing and cascade screening of relatives, implementation in Italy remains highly heterogeneous across regions. This study estimates the potential population health and cost impact of achieving full nationwide implementation of BRCA1/2 cascade screening in Italy and identifies key organisational barriers and priority actions for implementation. MethodsWe conducted a Health Impact Assessment integrating literature review, simulation modelling, and stakeholder consultation. A decision tree and Markov model compared the current heterogeneous implementation of BRCA screening in Italy with an ideal scenario reflecting full adherence to national guidelines, optimal cascade screening, and uptake of preventive strategies. Outcomes included breast and ovarian cancer incidence and mortality, healthcare costs over a lifetime horizon (80 years). Key barriers affecting organisational feasibility, acceptability, and patient well-being were assessed, and a set of priority action recommendations was developed. ResultsIn the ideal scenario, 25,626 eligible cancer patients would undergo BRCA testing annually, identifying 4,254 mutation carriers and enabling cascade testing of 27,650 relatives, of whom 8,682 would be BRCA-positive. Under the current implementation, only 8,807 patients and 2,168 relatives are tested, identifying 948 carriers. Over 30 years, full implementation would prevent 821 cancer cases (-27.9%) and 1,282 deaths (-49.7%) compared with the current scenario. While initial expenditures increase due to expanded testing and preventive interventions, cumulative costs decrease over time, resulting in net savings of {euro}5.8 million at 30 years and a saving per event avoided (-{euro}2,779). Major implementation barriers include fragmented governance, limited access to genetic counselling, heterogeneous laboratory practices, insufficient professional training, and weak referral pathways. ConclusionFull implementation of BRCA1/2 cascade screening in Italy would yield substantial population health benefits and long-term cost savings. Coordinated national governance, standardized pathways, investment in counselling and workforce capacity, and robust monitoring systems are essential to ensure equitable access and sustainable delivery of personalized cancer prevention. This study demonstrates the value of the HIA methodology for evaluating and guiding genomic prevention policies.

17

Challenges and perspectives in implementing whole-exome sequencing in Algeria lessons from a fully autonomous in-country cohort

AIT MOUHOUB, T.; BELADGHAM, K.; BRAHIMI, S.; GAGI, N.; MIHOUBI, A.; MOUTCHACHOU, H.; BOUABID, M. E. A.; BELAID, A.; YAHIAOUI, S.; BELAZZOUGUI, D.; IMESSAOUDENE, B.

2026-03-25 genetic and genomic medicine 10.64898/2026.03.23.26348909 medRxiv

Top 0.1%

3.6%

Show abstract

Despite the multidimensional value of implementing genomic medicine, in terms of diagnostic yield, cost-effectiveness, and optimisation of care trajectories, its deployment in many African countries, including Algeria, remains constrained by major structural and interpretive challenges, compounded by the persistent underrepresentation of African populations in genomic databases with direct consequences for variant interpretation and clinical decision-making. We implemented a fully in-house whole-exome sequencing (WES) workflow structured through a clinically driven sequential framework in 14 unrelated patients with unexplained neurodevelopmental disorders, in a context of high consanguinity and enriched recessive inheritance. A definitive molecular diagnosis was established in 8 cases, with pathogenic or likely pathogenic variants identified in MECP2, PTPN11, FOXG1, ARV1, GNAO1, ATM, ROBO3, and CHD3. Five cases yielded variants of uncertain significance and one clinically relevant incidental finding was identified. Beyond its diagnostic contribution, this study reveals persistent interpretive limitations: a disproportionate VUS burden, complex incidental finding management, and reduced accessibility to classification criteria, reflecting database underrepresentation, the predominance of private variants, and the limits of current frameworks in consanguineous settings. These findings underscore the necessity of population-specific reference datasets, iterative phenotyping, adapted ethical frameworks, and strategies addressing territorial disparities in access. This work demonstrates that WES implementation requires a structured multidisciplinary ecosystem integrating clinical, bioinformatic, and ethical dimensions, and provides a transferable model for the sustainable integration of genomic medicine in under-resourced settings, while highlighting the global scientific value of incorporating underrepresented populations into genomic research.

18

Genome-wide detection and clinical prioritization of tandem repeat outliers using long-read sequencing

Gibson, S. B.; Damaraju, N.; Gustafson, J. G.; Balton, E. V.; Chanprasert, S.; Glass, I. A.; Horike-Pyne, M.; Kumar, R. D.; Leppig, K. A.; Lundberg, C.; Ranchalis, J.; Rosenthal, E. A.; Solomon, A. K.; Stergachis, A. B.; Wener, M.; UDN, ; Jarvik, G. P.; Blue, E. E.; Dipple, K. M.; Dashnow, H.; Starita, L. M.; Miller, D. E.

2026-05-01 genetic and genomic medicine 10.64898/2026.04.30.26352103 medRxiv

Top 0.1%

3.6%

Show abstract

BackgroundTandem repeat expansions (TREs) cause over 60 known neurological, neuromuscular, and developmental disorders. Detecting these expansions genome-wide is challenging due to their size, sequence complexity (including interruptions), and population variation. While long-read sequencing is an emerging technology that can fully resolve many TREs, no methods have been described for genome-wide identification and prioritization of candidate pathogenic TREs with this technology. MethodsUsing a newly developed pipeline called TRoLR (Tandem Repeat outliers identified with Long Reads), we analyzed haplotype-resolved long-read genome assemblies from 471 ancestrally diverse individuals to define population distributions for over three million tandem repeat loci, capturing clinically relevant interruptions. Outlier expansions were identified relative to these distributions and prioritized by genomic location and comparison to known pathogenic loci. The framework was applied to 47 cases from the Undiagnosed Diseases Network. ResultsPopulation stratification of repeat metrics was observed at 7% of loci, with highest variability among individuals of African ancestry. Outlier analysis confirmed known pathogenic CNBP and ATXN8OS expansions, detected carrier-range alleles at RFC1, CSTB, and FXN, and revealed a novel CGG expansion in the 5 UTR of PCMTD2 exhibiting hypermethylation and intergenerational instability. Genome-wide screening also identified intronic pentanucleotide expansions at IQCB1 and MAP3K15 in controls composed of motifs that have been associated with pathogenicity at other disease loci. ConclusionsQuantifying the longest uninterrupted repeat segment in long-read assemblies enables detection of clinically relevant repeat expansions and loss of stabilizing interruptions. This approach enhances both diagnostic confirmation and discovery of candidate pathogenic expansions, with implications for clinical interpretation and research into complex repeat-mediated disorders.

19

Bulk RNA sequencing deconvolution of pancreatic ductal adenocarcinoma identifies cancer-associated fibroblast subsets associated with survival and tumor microenvironment composition

Dam, N.; Steketee, M. F. B.; Strijk, G.; Koning, W. d.; Hawinkels, L. J. A. C.; Kemp, V.; Eijck, C. H. J. v.; Kim, Y.; Eijck, C. W. F. v.; Os, B. W. v.

2026-04-06 cancer biology 10.64898/2026.04.03.716260 medRxiv

Top 0.1%

3.6%

Show abstract

Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer characterized by a high abundance of cancer-associated fibroblasts (CAFs), which influence therapy response, tumor biology and tumor aggressiveness. CAFs are a heterogeneous cell type and previous single-cell RNA sequencing (scRNAseq) of PDAC tumors identified three main CAF subtypes: myofibroblastic, inflammatory and antigen-presenting CAFs (myCAF, iCAF, apCAF, respectively). However, scRNAseq on large patient cohorts is often not feasible due to costs and technical constraints. Therefore, bulk RNAseq deconvolution can be used to identify cell types within the heterogeneous tumor microenvironment. Here, Statescope deconvolution was used to identify different cell types of the tumor microenvironment within an early onset PDAC cohort, comprising 74 patients aged under 60. Three CAF populations were identified (iCAFs, myCAFs and desmoplastic CAFs), and their correlations with tumor microenvironment components, mutational signatures and survival were examined. iCAFs were associated with classical-like tumor cells, whereas myCAFs and desmoplastic CAFs correlated with basal-like tumor cells. Desmoplastic CAFs are associated with inflammatory granulocytes/neutrophils, while negatively associating with monocyte-derived macrophages and immature/transitional B cells. No associations were observed between mutational signatures and the abundance of CAF and epithelial tumor subtypes. Interestingly, a high abundance of CAFs, and specifically increased iCAF abundance, was associated with improved survival. This iCAF-mediated survival effect was predominantly apparent in female patients. All in all, deconvolution of bulk RNA sequencing data, followed by its integration with clinical and biological parameters, reveals the heterogeneity and prognostic implications of CAF subpopulations in the tumor microenvironment of early onset PDAC patients.

20

Evaluating splicing factor and kinase network crosstalk through global phosphoproteomics

Crowl, S.; Singh, S.; Zhang, T.; Naegle, K. M.

2026-04-21 systems biology 10.64898/2026.04.16.718710 medRxiv

Top 0.1%

3.6%

Show abstract

Both splicing and kinase signaling are biochemical processes that fundamentally determine and shape cell physiology. Although there has been some indication that there is an interaction between the two - splicing can alter the availability of exons encoding kinase targets and kinases can phosphorylate splicing factors - it has yet to be established the extent to which altering splicing factor expression impacts kinase signaling networks. In this work, we implemented a data-driven analysis using ENCODE RNA-sequencing data and prior work mapping post-translational modifications onto splice events to identify candidate splice factor perturbations that show extensive alterations to phosphorylation-encoding protein products. We then replicated the ENCODE knockdown experiments and performed global phosphoproteomics for two candidates, U2AF1 and SRSF3, complementing the transcription-level data. Both knockdowns showed extensive changes in phosphorylation and kinase activities, both basally and upon receptor tyrosine kinase stimulation. U2AF1 knockdown drove decreased JNK-associated cell death signaling but elevated chromosome regulation through CSNK2A1, PLK, and EIF2AK4 activity. SRSF3 knockdown, on the other hand, led to decreased cell cycle signaling through CDK and HIPK2 but increased cytoskeletal signaling through various PAKs. In addition, we found a striking enrichment of phosphorylated splicing regulators in both knockdowns that were linked to their splicing activity, such as HNRNPC, suggesting potential feedback and crosstalk between splice factors through signaling pathway activation. Importantly, comparison of differential phosphorylation measurements from this study to mRNA expression and splicing measurements from ENCODE revealed significant knockdown-dependent protein regulation, not captured by transcriptomic measurements alone, underscoring the value of phosphoproteomic profiling after splice factor perturbations. Combined, the transcriptomics and phosphoproteomics reveal deep interconnection between the two processes that are relevant to understanding cell signaling in health and disease.